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Abstract 


A  new  approach  to  autonomous  vehicle  perception  is  presented  which  solves  the 
historictdly  significant  throughput  problem  at  contemporary  speeds  through 
computational  stabilization  of  the  sensor  sweep.  This  adaptive  approach  to  per¬ 
ception  has  made  it  possible  to  achieve  unprecedented  autonomous  vehicle 
speeds  at  little  or  no  cost  to  other  aspects  of  performance. 

In  order  to  measure  the  local  environment  at  sufficient  resolution  and  sufficient 
rate,  an  autonomous  vehicle  requires  computational  throughput  on  the  order  of 
O  [TV^]  where  T  is  the  vehicle  reaction  time  and  V  is  the  velocity.  On  the  other 
hand,  the  traditional  approach  of  nonadaptive  range  image  processing  requires 
throughput  on  the  order  of  O  [  7^  V^] .  The  product  TV  is  on  the  order  of  10  for  a 
conventional  automobile  so  the  difference  between  these  two  expressions  is  four 
orders  of  magnitude  at  20  mph.  Nonadaptive  range  image  processing  requires 
about  1  gigaflop  in  order  to  achieve  20  mph  speeds  whereas  the  algorithm  pre¬ 
sented  here  requires  1/10  of  1  megaflop  under  identical  assumptions. 

This  report  concentrates  on  the  adaptive  perception  algorithm  which  forms  the 
basis  of  RANGER’S  Map  Manager  object.  The  techniques  L^^ed  should  be  appli¬ 
cable  to  any  application  that  models  the  environment  with  a  terrain  map. 
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1.  Introduction 


One  of  the  distinguishing  characteristics  of  high  speed  autonomy  is  that  the  ratio  of  sensor  height 
to  vehicle  stopping  distance  is  small  and,  as  a  result,  range  pixels  normally  intersect  the  terrain  at 
glancing  angles.  On  the  surface,  this  causes  many  problems,  but  finer  analysis  indicates  that  a 
consistent  application  of  the  small  incidence  angle  assumption  leads  to  solutions  to  some  of  those 
same  problems.  A  second  aspect  of  the  problem  is  that  the  vertical  field  of  view  of  imaging  sensors 
is  normally  aligned  with  the  direction  of  travel  and  therefore  image  sequences  normally  contain 
massively  redundant  data.  A  direct  management  of  data  redundancy  through  surprisingly  trivial 
adaptive  perception  techniques  leads  to  theoretical  near  minimum  perceptual  throughput. 

A  simple  elegant  active  perception  algorithm  is  introduced,  based  on  these  assumptions,  which 
computationally  stabilizes  the  sensor  sweep.  It  solves  the  historically  significant  throughput 
problem  in  practical  terms,  and  is  four  orders  of  magnitude  more  efficient  than  straightforward 
complete  processing  of  all  images  at  20  mph.  In  converting  any  range  imaging  sensor  to  an  ideal 
adaptive  line  scanner,  it  obviates  the  need  for  parallel  processing  in  perception  and  makes 
unprecedented  vehicle  speeds  possible  on  general  purpose  computer  hardware.  The  system 
achieves  the  throughput  necessary  for  20  mph  rough  terrain  autonomy  on  a  typical  engineering 
workstation. 

The  key  problem  of  range  image  perception  is  that  the  position  of  the  end  of  a  range  pixel  is 
unknown  until  it  is  computed,  and  the  computation  of  its  location  is  the  largest  element  of  the 
computational  expense  of  a  pixel.  Any  attempt  to  selectively  process  data  in  an  area  of  interest 
falters  because  the  problem  of  selection  is  as  difficult  as  the  problem  of  perception.  Luckily,  for 
high  speed  autonomy,  there  is  a  key  assumption,  the  small  incidence  angle  assumption,  which 
allows  the  circle  to  1^  broken,  and  dlows  selective  processing  of  image  data. 

Adaptive  perception  is  implemented  for  laser  rangefinder  images  by  actively  searching  for  the  data 
that  lies  in  a  range  window  in  the  image.  Adaptive  perception  is  implemented  for  stereoscopy  by 
noticing  that  a  range  window  corresponds  to  a  disparity  window.  Other  techniques  are  avmlable 
to  improve  matters  even  further. 

RANGER  is  an  acronym  for  Real-time  Autonomous  Navigator  with  a  Geometric  Engine.  This 
report  describes  the  adaptive  approach  to  perception  which  allows  the  RANGER  system  to  support 
unprecedented  vehicle  speeds  while  still  guaranteeing  vehicle  safety. 

LI 

None  of  this  work  would  have  been  possible  without  the  guidance  of  Martial  Hebert.  Martial  taught 
me  terrain  mapping,  laser  rangefinders,  and  guided  the  development  of  adaptive  stereo.  Many 
conversations  with  Martial  led  me  into  the  search  for  a  different  way  of  doing  range  image 
perception. 
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12  Commmtary 

The  throughput  problem  of  autonomous  navigation  has  long  been  regarded  as  a  fundamental 
limitation  and  many  organizations  continue  research  into  parallel  processing  approaches  to  the 
problem.  This  view  of  the  difficulty  of  the  problem  is  a  natural  conclusion  if  we  adopt  the 
assumption  that  entire  images  need  to  be  processed  at  high  rate.  There  are  situations  where  this 
assiunption  is  valid,  and  others  where  it  is  not 

The  image  processing  view  is  necessary  when  the  environment  is  dynamic  and  unpredictable 
because  Aen  the  vehicle  cannot  predict  the  motion  of  important  environmental  features.  It  has  to 
look  aroimd  for  them.  It  is  also  necessary  in  order  to  track  stationary  or  moving  environmental 
features  for  its  own  sake  or  for  position  estimation  purposes. 

Another  justification  for  the  image  processing  view  is  the  poor  quality  of  images  relative  to  the 
application  requirements.  In  this  case,  estimation  theory  can  be  used  to  merge  redundant 
measurements  into  an  acceptable  overall  estimate  of  the  state  of  the  environment. 

Under  an  assumption  of  a  static  environment,  a  moving  vehicle  can  reasonably  expect  an  obstacle 
to  stay  put,  and  to  therefore  “scroll  by”  as  the  vehicle  itself  moves.  All  other  things  being  equal,  an 
obstacle  needs  to  be  seen  only  once,  provided  the  vehicle  can  track  its  own  motion  accurately 
enough,  and  provided  the  perception  of  the  nature  and  location  of  the  obstacle  is  of  adequate 
fidelity. 

Even  if  the  environment  is  not  self-stationary,  there  is  a  second  justification  for  a  selective  approach 
to  environmental  perception  at  high  speed.  The  capacity  of  the  vehicle  to  react  to  significant 
external  events  becomes  ever  more  challenged  as  speeds  increase.  On  this  basis,  it  is  possible  to 
waste  most  or  all  perception  cycles  trying  to  see  obstacles  that  cannot  be  avoided  anyway.  Thus, 
the  processing  of  perception  data  that  corresponds  to  regions  too  close  to  the  vehicle  is 
fundamentally  unsound  for  obstacle  detection  and  avoidance  purposes. 

In  extreme  cases,  this  logic,  coupled  with  the  extreme  density  of  image  data  near  the  vehicle,  leads 
to  the  conclusion  that  most  perception  processing  is  completely  useless  at  high  speeds.  Further, 
limited  throughput  is  the  practical  justification  for  poor  angular  resolution  -  because  high  resolution 
data  cannot  be  processed  anyway.  Therefore,  the  historical  tradeoff  has  been  to  provide  inadequate 
angular  resolution  at  the  high  ranges  where  it  is  needed  in  order  to  provide  high  resolution  data 
close  to  the  vehicle  -  when  it  is  too  late  to  use  it. 

To  accept  the  idea  of  adaptive  perception  is  to  also  accept  the  new  requirement  to  remember  what 
was  seen  previously.  This  requirement  for  a  “map”  of  some  kind,  justified  on  throughput  grounds, 
can  also  be  justified  on  a  mote  fundamental  level.  The  idea  of  selective,  or  active,  or  adaptive 
perception  has  been  around  for  some  time.  It  is  clearest  in  the  systems  which  track  features  from 
image  to  image  based  on  predictor/corrector  algorithms  and  small  search  windows  that  move 
between  images.  Here,  this  idea  will  be  applied  to  the  problem  of  terrain  mapping  in  outdoor  rough 
terrain. 
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2.  Analytical  Basis  of  the  Concept 

This  section  presents  some  of  the  key  considerations  which  drive  the  design  of  the  perception 
system.  These  are  presented  in  more  detail  in  [15]. 

Guaranty^  Safrty 

In  order  to  guarantee  that  the  vehicle  remains  safe,  the  automatic  control  system  must  ensure  that 
the  following  four  requirements  are  met  simultaneously  for  all  time: 

•  Guaranteed  Response  -  The  system  must  ensure  that  important  environmental  events  are 
perceived  in  time  to  respond  accordingly. 

•  Guaranteed  Throughput  -  The  system  must  ensure  that  it  never  drives  over  unknown 
terrain. 

•  Guaranteed  Detection  -  The  system  must  be  able  to  resolve  the  smallest  feature  of  the 
environment  that  can  present  a  hazard. 

•  Guaranteed  Localization  -  The  system  must  be  able  to  locate  hazards  relative  to  the  vehicle 
to  sufficient  accuracy. 

2.2  Throughput  Problem 

The  throughput  required  to  process  an  image  depends  on  the  number  of  pixels  in  the  image.  The 
number  of  pixels  depends  on  the  field  of  view  and  resolution.  Resolution  depends  on  the  acuity 
requirement  which  implies  it  depends  on  range.  The  response  requirement  implies  that  range 
depends  on  speed  so  that  resolution  depends  on  speed.  The  throughput  requirement  also  implies 
that  field  of  view  depends  on  speed.  So  ultimately,  throughput  can  be  expressed  solely  in  terms  of 
speed. 

V^th  an  analysis  of  response  and  acuity  it  is  possible  to  analyze  the  computational  complexity  of 
perception.  In  intuitive  terms,  guaranteed  response  implies  that  throughput  is  proportional  to  a  high 
power  of  velocity  because: 

•  Maximum  range  increases  quadratically  with  speed  (because  braking  distance  does) 

•  Pixel  size  increases  quadratically  with  maximum  range  (in  order  to  resolve  obstacles) 

•  Throughput  increases  quadratically  with  pixel  size  (assuming  fixed  field  of  view) 

'ITus  relationship  is  indicated  in  the  following  figure: _ 


sensor  ang  flops 

speed  range  res 


Figure  1  -  Throughput  Problem 


When  throughput  is  limited,  this  relationship  gives  rise  to  a  trade-off  between  speed  and  resolution. 
Naive  analysis  suggests  that  the  problem  of  high  speed  navigation  is  nearly  impossible,  because 
the  necessary  throughput  is  impractical.  This  will  be  called  the  throughput  problem. 
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2^  Tht  IllBsiQii 

From  an  image  processing  perspective,  the  throughput  problem  appears  to  be  impossible.  Consider 
that  contemporary  rangefinders  are  10  mrad  resolution  and  many  researchers  believe  that  1  mrad 
or  so  is  needed  to  resolve  obstacles  for  conventional  automobiles.  A  ten  fold  increase  in  resolution 
is  a  hundredfold  increase  in  pixels  and  a  hundredfold  increase  in  required  throughput.  Today,  it  is 
not  possible  to  process  10  mrad  images  fast  enough  on  a  10  Mflop  processor.  Therefore,  if 
resolution  were  increased  tenfold,  it  would  be  impossible  to  process  1  mrad  images  on  a  1  Gflop 
processor.  Brute  force  is  not  the  elegant  way  to  solve  this  problem. 

On  the  other  hand,  the  raw  requirement  is  the  throughput  requirement,  and  this  is  trivial  to  meet. 
Consider  that  a  5  m/s  vehicle  covers  about  6  map  cells^  between  images  at  2  Hz,  so  there  is  band 
in  the  image  about  six  pixels  wide^  which  would  supply  exactly  the  needed  steady  state  throughput 

This  section  will  show  that  the  throughput  problem  is  an  illusion  which  arises  from  an  image 
processing  view  of  the  problem  and  that  simple  adaptive  techniques  can  solve  it  completely  at 
contemporary  sensor  resolutions. 

2A  Adaptive  Perception 

It  is  possible  to  solve  the  through^^dt  problem  while  simultaneously  guaranteeing  safety  as 
efficiently  as  is  theoretically  possible  by  employing  four  principle  mechanisms. 

•  Adaptive  Lookahead  is  a  mechanism  for  guaranteeing  that  the  vehicle  can  respond  to  any 
hazards  that  it  may  encounter  at  any  speed. 

•  Adaptive  Sweep  is  a  mechanism  for  guaranteeing  barely  adequate  throughput  and  the  fastest 
possible  reaction  time.  In  this  way,  speed  is  maximized. 

•  Adaptive  Scan  is  a  mechanism  for  ensuring  barely  adequate  resolution  that  is  as  constant  as 
possible  over  the  field  of  view.  In  this  way,  speed  is  maximized  without  compromising 
robusmess  of  the  system. 

•  Adaptive  Regard  is  a  mechanism  for  ensuring  that  the  system  minimizes  the  spatial  extent  of 
the  region  it  perceives  based  on  vehicle  maneuverability  so  that  speed  is  maximized  without 
compromising  safety. 

Together,  these  mechanisms  can  increase  the  efficiency  (measured  in  terms  of  range  pixel 
throughput)  of  a  system  by  four  orders  of  magnitude  at  20  mph  while  simultaneously  makmg  it 
considerably  more  robust. 


1.  Assuming  1/2  meter  cell  resolution. 

2.  Assuming  the  image  resolution  is  adequate  to  land  one  pixel  in  a  map  cell. 
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2^  The  Fundamental  Speed/Resolution  Tyadc-off 

Identical  resolution  assumptions  lead  to  the  following  throughput  estimates  for  different  image 
processing  algorithms: 


Table  1:  ThrcHighput  Estimates 


Algorithm 

Estimate  at  Minimum 
Acuity,  4  second  Reaction 
Hme,  and  10  m/s  speed 

Complexity 

constant  flux 

250  Mflops 

adaptive  sweep 

0.7  Mflops 

adaptive  sweep,  scan 

0.035  Mflops 

ideal 

0.0045  Mflops 

The  actual  data  for  all  4  second  reaction  time  curves  is  plotted  below  on  a  logarithmic  vertical 
scale. 


Throughput  for  All  Algorithms 


Velocity  in  Meters/Sec 

Figure  2  -  Throughput  for  All  Algorithms 


The  logic  of  decreasing  pixel  size  for  higher  speeds  is  inescapable,  but  equivalent  logic  leads  to  a 
reduced  vertical  field  of  view  requirement,  so  if  the  vertical  field  of  view  is  not  reduced,  extreme 
throughput  waste  is  being  tolerated.  Further,  because  pixel  aspect  ratio  is  extremely  elongated  at 
high  ranges,  the  density  of  measurements  in  the  crossrange  direction  is  grossly  suboptimal  unless 
it  is  managed. 
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Notice  that  the  complexity  in  either  the  above  cases  contains  a  constant  times  a  power  of  the 
product  That  is: _ 

This  will  be  called  the  fundamental  trade-off  because  it  indicates  that  the  trade-off  of  finite 
computing  resources  is  one  of  reliability  for  speed.  This  is  a  basic  trade-off  of  speed  and  resolution 
which  always  arises  from  a  system  throughput  limit  Computing  resources  establish  a  limit  on 
vehicle  performance  which  can  be  expressed  as  either  high  speed  and  low  reliability  or  vice  versa. 

There  are  a  few  ways  to  read  the  result  If  throughput  is  fixed,  then  speed  is  inversely  proportional 
to  reaction  time.  If  speed  is  fixed,  throughput  required  is  the  nth  power  of  reaction  time.  If  reaction 
time  is  fixed,  throughput  is  the  (n+m)th  power  of  speed. 

Small  Incidence  Angle  Assumption 

It  turns  out  that  the  low  peixeption  ratio  h/R  which  causes  problems  with  respect  to  scanning 
density  gives  some  payback  with  respect  to  adaptive  regard  and  the  sampling  problem.  On  the 
surface,  adaptive  regard  appears  to  be  impossible  because  the  mapping  from  image  space  to  world 
space  is  unknown  until  it  is  computed.  Once  a  pixel  is  computed,  it  might  as  well  be  used.  This 
logic  leads  to  processing  the  entire  image  and  it  is  doomed  to  failure  as  was  shown  earlier.  The  only 
remaining  question  is  how  to  do  it. 

Luckily,  the  range  measurement  from  the  sensor  to  the  environment  is  almost  identical  to  its 
groundplane  projection  (because  the  angle  involved  is  so  shallow).  Indeed,  the  relative  error  in 


This  factor  is  on  the  order  of  1%  for  high  speeds  simply  because  the  range  is  so  large.  The 
assumption  of  a  small  perception  ratio  will  be  called  the  small  incidence  angle  assumption.  If  the 
perception  system  attempts  to  process  all  geometry  within  a  range  window,  the  quest  for  the  end 
of  the  range  window  will  automatically  walk  right  to  the  top  of  the  image  if  necessary  and,  as  a  side 
effect,  it  will  discover  the  height  of  a  near  vertical  surface  as  long  as  it  remains  within  the  range 
window.  This  mechanism  is  far  superior  to  simply  processing  a  fixed  subset  of  the  vertical  field  of 
view  (an  elevation  angle  window)  in  part  because  of  its  performance  on  near  vertical  surfaces. 
Such  surfaces  would  fall  outside  and  elevation  angle  window  and  would  not  be  completely 
processed. 

The  processing  of  pixels  outside  the  range  window  is  comprised  of  nothing  more  than  reading  their 
values  and  comparing  them  to  the  window.  Further,  because  a  terrain  map  already  makes  a  2-1/2 
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world  assumption,  the  range  values  can  be  consistently  assumed  to  be  monotonic  in  elevation  angle 
and  this  further  reduces  the  required  processing  because  some  pixels  need  never  be  visited  at  all. 
This  monotone  range  assumption  also  provides  the  basis  for  ambiguity  removal  in  phase 
ambiguous  sensors  like  AM  rangefinders. 

It  was  shown  in  [IS]  that  as  the  minimum  range  increases,  the  scanning  density  decreases 
quadratically  and  approaches  one.  Therefore,  because  adaptive  regard  discards  high  depression 
scanlines  anyway,  the  sampling  problem  is  far  less  severe  at  high  speed. 
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3.  Adaptive  Perception  From  Range  Images 

In  the  case  of  laser  rangefinders,  a  range  image  is  generated  in  hardware.  For  this  class  of  sensors, 
adaptive  perception  is  limited  to  extraction  of  the  range  window  from  the  complete  image.  The 
adaptive  perception  algorithm  constitutes  a  simultaneous  implementation  of  adaptive 
lookahead,  adaptive  sweep,  and  adaptive  scan  in  one  place.  It  proceeds  in  two  phases;  the 
computation  of  the  range  window  based  on  the  current  speed  and  cycle  time,  and  the  mapping  of 
this  window  into  image  space  and  the  extraction  of  the  data  from  the  image  while  converting  it  back 
into  world  coordinates. 

2A  Range  Window  Computation  -  First  Phase 

A  range  window  is  computed  which  computationally  points  the  sensor  vertical  field  of  view  based 
on  the  guaranteed  response  and  guaranteed  throughput  requirements.  Adaptive  lookahead  is 
implemented  by  computing  the  distance  required  to  execute  an  impulse  tum^  at  the  current  speed. 
This  gives  the  maximum  range  of  the  range  window.  Adaptive  sweep  is  implemented  by 
computing  the  distance  travelled  based  on  the  cycle  time  and  the  speed  and  subtracting  this 
distance  from  the  maximum  range.  This  adaptive  sweep  algorithm  will  automatically  adapt  to  any 
system  cycle  time  or  sensor  frame  rate  using  small  sweeps  for  faster  sensors. 

The  true  range  window  must  be  modified  for  three  effects.  First,  the  sensor  is  not  mounted  at  the 
vehicle  control  point,  so  the  above  range  window,  the  planning  window,  is  adjusted  for  the  offset 
of  the  sensor  in  order  to  project  the  window  into  sensor  coordinates.  Further,  the  vehicle  is  not  itself 
a  point,  so  adaptive  regard  is  implemented  by  adding  the  vehicle  wheelbase  to  the  maximum  range 
so  that  the  largest  possible  hazard,  a  pitch  hazard,  can  be  detected  in  the  detection  zone  beyond  this 
distance.  Third,  there  may  be  significant  delay  associated  with  the  acquisition  of  an  image,  so  the 
range  measurements  are  adjusted  for  the  age  of  the  image.  This  is  the  true  range  window.  A 
conceptual  C  code  fragment  is  as  shown  below: 

/* 

**  Plan  Window 
*/ 

Pmax  =  speed  *  treact  +  rhomin;/*  adaptive  lookahead  */ 

Pmin  =  Pmax  -  speed  *  cycle_time  /*  adaptive  sweep  */ 

/* 

**  Range  Window 
*/ 

Rmax  =  Pmax  +  speed  *  sensor_latency  -  sensor_y  +  wheelbase; 

Rmin  =  Pmin  +  speed  *  sensor_latency  -  sensor_y; 


The  fact  that  the  last  line  does  not  subtract  the  wheelbase  is  a  practical  measure  to  ensure  that  all 
geometry  used  in  planning  comes  from  one  image.  This  solves  the  image  registration  problem  at  a 
slight  cost  in  redundant  computation. 


3.  An  impulse  tum  was  chosen  as  the  maneuver  upon  which  to  base  adaptive  regard  because  a  vehicle  which 
stops  when  it  sees  an  obstacle  (panic  stop  maneuver)  is  not  useful  when  obstacles  are  dense.  The  impulse  tum 
is  a  tum  from  zero  curvature  to  the  maximum.  Other  distinguished  maneuvos  are  the  turning  stop  and  the 
curvature  reverse. 
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2.2  Ranee  Window  Processing  -  Second  Phase 

In  the  second  phase  of  adaptive  perception,  adaptive  scan  is  implemented  by  simply  skipping 
columns  in  the  image.  Up  to  7  columns  in  8  can  be  skipped  for  typical  geometry  and  square  pixels. 
This  maintains  guaranteed  throughput  and  essentially  uniform  coverage.  More  sophisticated  forms 
of  adaptive  scan  were  attempted,  but  the  cost  of  computing  the  imaging  Jacobian  outweighed  the 
gains  achieved,  so  the  simpler  constant  technique  survives.  It  would  be  a  simple  matter  to  filter  the 
input  image  in  azimuth  only,  but  it  has  not  been  implemented  at  this  point,  because  it  seems 
unnecessary  on  the  ERIM  sensor. 

By  the  small  incidence  angle  assumption,  the  projection  of  the  sensor  range  onto  the  groundplane 
is  essentially  the  groundplane  y  coordinate.  However,  terrain  roughness  and  nonzero  vehicle  roll 
mean  that  the  position  of  the  range  window  in  the  image  is  different  for  each  column.  Thus  the 
range  window  is  processed  on  a  per  column  basis.  A  conceptual  code  fragment  is  as  follows: 

/* 

**  Process  Range  Window 
*/ 

int  i,j; 

j  =  iinage->start_col; 
while  (  j  <  image ->end_col  ) 

{ 

i  =  image->end_row; 

while  (  i  >  iinage->start_row  ) 

{ 

if  (range (i,j)  >  Rmax  )  break; 

else  if(  range(i,j)  <  Rmin  )  {i — ;  continue;) 

else  process_pixel_into_map( ) ; 

i--; 

) 

j  -=  image->col_skip; 

} _ 

The  monotone  range  assumption  appears  as  the  break  statement  after  the  first  conditional.  The 
start_col  and  end_col  etc.  variables  implement  a  fixed  azimuth  and  elevation  angle  window  within 
which  the  range  window  always  lies  on  typical  terrain.  Ignoring  the  constant  overhead  of  actually 
processing  a  pixel  and  indexing  into  an  image  etc.,  the  algorithm  is  implemented  in  about  10  lines 
of  code.  Its  significance  is: 

•  It  is  a  theoretically  sound  solution  to  the  throughput  problem.  Therefore,  it  questions  the  need 
for  expensive  hard  to  use  parallel  processing  in  high  speed  autonomy.  It  makes  20  mph 
autonomy  feasible  on  throughput  grounds  and  is  four  orders  of  magnitude  more  efficient  than 
reactive  approaches  which  process  an  entire  2  Hz  image  at  that  speed. 

•  It  eliminates  the  image  registration  problem  because  the  systematic  errors  which  cause  the 
problem  vary  little  over  the  small  width  of  the  range  window. 

•  It  amounts  to  a  computational  solution  to  the  stabilization  problem  within  the  limits  of  the 
sensor  field  of  view.  For  limited  field  of  view  sensors,  it  provides  an  obvious  basis  for  the 
generation  of  sensor  pointing  commands  which  keep  the  average  range  window  centered  in 
Ae  image. 

•  Along  with  adaptive  regard,  it  solves  the  sampling  problem  for  practical  purposes  because  the 
range  ratio  is  very  low  over  the  small  width  of  the  range  window.  It  theoretically  guarantees 
throughput,  response,  and  acuity  within  the  limits  of  die  vehicle  and  the  sensor.  It  actively 
guarantees  vehicle  safety  and  adapts  to  vehicle  speed,  attitude  and  terrain  shape  implicidy. 
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^  ivpicai  lixam^c 

In  a  typical  image,  the  pixels  that  are  actually  processed  form  a  jagged  edged  band  across  the 
horizontal.  The  width  of  the  band  decreases  quickly  as  the  vehicle  speed  increases  and  adaptive 
lookahead  moves  the  window  up  the  image.  However,  the  validity  of  the  small  inddeiKe  angle 
assumption"^  guarantees  that  no  matter  what  the  terrain  shape  is  and  no  matter  what  the  vehicle 
attitude  is,  adaptive  perception  will  generate  a  perfect  wedge  of  geometry  which  is  exacdy  the 
requirement  for  the  current  planning  cycle. 

The  following  figure  gives  a  sequence  of  range  images  for  a  run  of  the  RANGER  simulator  on  very 
rough  terrain  using  a  simulated  ERIM  rangefinder  where  the  pixels  that  were  actually  processed 
are  highlighted  as  vertical  white  lines.  On  average,  even  in  this  worst  case,  only  200  range  pixels 
out  of  the  available  10,000  (or  2%)  were  processed  per  image.  Thus,  the  2%  geometric  efficiency 
of  the  sensor  is  effectively  increased  to  100%  and  throughput  is  increased  by  a  factor  of  50  times, 
or  two  orders  of  magnitude.  The  sparsity  of  the  data  in  the  plarmer  window  is  interpolated  away 
internally. 


Range  Image  Sequence 


Figure  4  -  Adaptive  Rangefinder  Perception 


Notice  that  the  algorithm  adrqrts  automatically  to  vehicle  roll  in  the  fourth  image  and  that  it 
processes  the  hill  obstacle  completely  in  the  third  image  as  soon  as  it  appears.  Notice  also  the  skew 
of  the  vehicle  icon  with  respect  to  the  wedge  of  paths  in  the  plarmer  window.  This  is  an  aspect  of 
the  steering  feedforward.  The  data  is  shifted  up  out  of  the  image  completely  in  the  last  image  on 
the  right  side  because  of  vehicle  roll. 


4.  Of  course,  it  is  the  relative  accuracy  of  tbe  small  angle  assumption  across  two  images  wbicb  really  matters 
f(w  tlirougbput  reasons,  so  die  continuity  assumption  makes  the  small  angle  assumption  even  more  valid. 
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4.  Terrain  Map  Management 

Consider  the  implications  of  satisfaction  of  guaranteed  response  and  acuity  on  the  memory  and 
computation  required  to  manage  a  traditional  terrain  map.  Up  to  30  meters  of  lookahead  is  not 
uncommon  and  resolutions  on  the  order  of  1/6  meter  are  theoretically  necessary.  Using  only  20 
bytes  of  memory  per  map  cell,  over  1/2  megabyte  of  memory  is  required  to  store  a  typical  map.  If 
this  map  is  stored  as  a  physically  coherent  block  of  memory,  it  must  be  physically  shifted  and 
copied  after  the  acquisition  of  each  image.  Using  a  realistic  frame  rate  of  10  Hz,  which  is  necessary 
to  guarantee  detection,  the  overhead  involved  in  simply  storing  and  managing  the  environment^ 
model  is  not  justified  in  a  real  time  system. 


The  solution  to  this  problem  is  a  classical  one  from  computer  science  -  the  ring  buffer.  However, 
there  is  no  intrinsic  requirement  to  insert  and  detete  nodes  into  a  linked  structure  because  the  map 
is  spatially  coherent  and  of  uniform  resolution.  Hence,  a  simple  array  accessed  with  modulo 
aritlunetic  suffices  to  logicatty  scroll  the  map  as  the  vehicle  moves. 

The  operating  principle  of  the  wrappable  terrain  map  is  that  the  indices  into  the  array  are 
determined  by  modulo  arithmetic  as  follows: 


The  operator  Lxj  is  the  least  integer  function  and  rem(x/y)  is  the  floating  TOint  remainder 
function.  These  are  implemented  more  efficiently  than  a  function  call  in  the  code^.  The  operation 
of  the  technique  when  applied  to  three  successive  images  is  indicated  below. 


This  approach  creates  new  problems,  the  most  serious  of  which  is  that  the  mapping  from  world 
coordinates  to  map  indices  is  multiply  defined  and  therefore  the  inverse  mapping  is  not  a  function. 
In  mathematical  terms,  the  coordinate  transform  is  not  onto. 


5.  By  assuming  excursions  ate  limited  to  a  few  tin^  the  distance  to  the  moon! 
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An  infinity  of  points  in  global  coordinates  all  map  onto  a  single  cell  in  the  map,  so  remnants  of 
images  of  arbitrary  age  may  remain  in  the  map  inrtefinitely.  Suppose  the  elevation  at  the  point  (IS, 
25)  is  needed  and  the  map  is  10  by  10.  Then  Ae  point  (5,  IS)  may  also  be  in  the  ms^.  A  query  for 
the  elevation  at  (IS,  2S)  may  get  the  elevation  at  (S.  IS)  instead. 

The  system  manages  this  problem  in  a  very  simple  way.  Although  all  data  remains  in  the  miq>  until 
it  is  overwritten,  each  entry  is  tagged  with  the  distance  that  the  vehicle  had  travelled  since  the  start 
of  the  mission  at  the  time  Ae  pixel  was  measured.  If  the  “age”  of  the  last  update  is  too  old,  the  cell 
is  considered  to  be  empty^.  This  technique  eliminates  all  of  the  overhead  of  map  management 
except  for  the  coordinate  transformation  necessary  to  access  it  If  pixels  older  than  the  length  of  the 
map  are  discarded  in  this  way,  it  is  impossible  for  old  data  to  pote  through  the  holes  in  new  data. 
The  impact  is: 

•  it  is  never  necessary  to  perform  a  copy  of  the  map  data  structure 

•  graphical  output  had  to  be  modified  to  wrap  around  in  the  map  window 

Theoretically,  the  complexity  of  map  management  is  squared  in  the  resolution  unless  the  map  is 
wrappable.  The  cost  of  management  of  this  data  structure  is  independent  of  both  its  size  and  its 
resolution. 

^  Delayed  Interpolation 

The  terrain  map  is  not  interpolated  at  all.  This  is  done  because  the  interpolation  of  the  map  requires 
a  complete  traversal  and  this  is  too  expensive.  Instead,  the  responsibility  for  interpolation  is  left 
with  the  users  of  the  map.  Spatial  interpolation  is  wasteful  because  the  ultimate  use  of  the  map  is 
to  evaluate  vehicle  safety  and  vehicle  safety  can  be  expressed  as  a  time  signal. 

It  is  more  efficient  to  delay  interpolation  until  the  point  in  the  computation  at  which  it  is  really 
needed,  and  this  point  occurs  inside  the  tactical  controller.  Also,  only  a  small  portion  of  the  map  is 
actually  used  in  some  situations  because  the  vehicle  maneuverability  is  limited.  Any  interpolation 
of  unused  geometry  amounts  to  a  waste  of  resources. 

A  further  aspect  of  the  interpolation  problem  is  that  occlusion  is  inevitable  anyway,  so  spatial 
interpolation  can  never  succeed  fully  without  unjustified  and  harmful  smoothness  assumptions  on 
rough  terrain. 

Thus  the  tactical  controller  interpolates  in  time  instead  of  in  space.  Throughout  the  internal 
processing  of  the  tactical  controller,  the  central  data  structure  is  a  time  signal  which  may  or  may 
not  be  known  at  a  particular  point  in  time.  The  system  is  robust  by  design  to  unknown  signal  values 
and,  as  a  by-product  of  its  processing,  computes  an  assessment  of  how  much  geometry  is  actually 
unknown  and  reacts  accordingly.  In  this  way,  interpolation  and  occlusion  are  treated  in  a  unified 
way  and  the  system  considers  too  much  of  either  to  be  hazardous. 

±2.  Imagt  Rtastratwn 

A  simple  image  registration  algorithm  is  used  in  situations  where  the  imaging  density  is  reduced 
below  the  amount  necessary  to  ensure  that  the  geometry  under  the  rear  wheels  comes  from  the 
same  image  as  the  front  wheels  in  the  feedforward  simulation. 


6.  Age  is  measured  by  distance  and  not  time  because  otherwise  the  geometry  under  the  wheels  eventually  dis¬ 
appears  when  tte  vehicle  stops. 
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This  module  recovers  the  vehicle  excursion  between  images  by  matching  the  overlapping  regions 
of  consecutive  images.  Currently,  only  the  elevation  (z)  coordinate  is  matched  and  this  works  best 
in  practice  due  to  systematic  errors  which  are,  as  yet,  unidentified.  When  the  z  deviation  of  two 
consecutive  images  is  computed,  it  is  applied  to  all  incoming  geometry  samples  in  order  to  remove 
the  mismatch  error. 

^  Fusion/Accumulation 

After  the  mean  mismatch  error  is  removed,  there  are  still  random  errors  in  the  elevation  data.  The 
map  manager  computes  mean  and  standard  deviation  statistics  for  map  cells  in  situations  when  two 
or  more  range  image  pixels  from  the  same  image  or  sometimes  from  consecutive  images  fall  into 
the  same  map  cell.  In  order  to  do  this,  each  map  cell  has  an  image  frame  number  tag  as  well  as  the 
age  tag.  When  the  incoming  frame  number  differs  from  the  stored  one,  the  statistical  accumulators 
are  zeroed. 

The  value  of  this  mechanism  in  computing  a  statistically  meaningful  result  is  highly  suspect. 
However,  it  survives  because  the  deviations  computed  are  a  representation  of  the  slope  of  the 
terrain  within  a  single  cell. 
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5.  Adaptive  Stereo  Perception 


Adaptive  perception  is  also  possible  for  stereo  ranging  systems.  The  basic  principle  of  the  range 
window  can  be  converted  to  a  disparity  window  for  a  stereo  system  because  the  two  are  related 
by  the  stereo  baseline.  There  is  a  slight  difference  in  the  geometry  of  a  stereo  range  image 
compared  to  a  rangefinder  image.  The  first  is  based  on  perspective  geometry  and  the  second  is 
based  on  spherical  polar  geometry.  Therefore,  a  disparity  window  corresponds  to  a  window  on  the 
y  coordinate  and  not  the  true  polar  range.  However,  in  most  circumstances,  this  distinction  can  be 
safely  ignored. 


^  Basics  of  Stereo  Perception 

For  outdoor  terrain,  area  based  stereo  algorithms  are  typically  used  because  it  is  necessary  to 
estimate  the  range  of  every  pixel  in  the  image.  The  typical  steps  in  the  process  are: 


•  Preprocess  the  images  to  enhance  texture  and  remove  bias  and  scale  variations  across  the 
image,  llie  output  of  this  process  is  a  normalized  image  which  corresponds  to  each  input 
image. 

•  For  each  candidate  disparity  considered,  for  a  window  around  each  pixel  in  the  first  image, 
compute  a  measure  of  correlation  between  it  and  a  window  around  the  pixel  in  the  second 
image  which  is  displaced  by  the  disparity  considered.  The  output  of  this  process  is  a  cube  of 
numbers  of  the  form  Corr[i,j,d]  which  w^  be  called  the  correlation  tensor. 

•  The  curve  Corr[d]  obtained  by  fixing  the  row  and  column  indices  of  the  correlation  tensor 
will  be  called  the  correlation  curve.  For  each  pixel  in  the  first  image,  the  correlation  curve  is 
searched  to  find  its  maximum  value.  The  value  of  the  disparity  at  the  maximum  value  of  the 
correlation  curve  for  that  pixel  is  the  quantity  of  interest.  The  output  of  this  process  is  a 
disparity  image. 

•  For  each  pixel  in  the  disparity  image,  convert  disparity  to  range  using  the  stereo  baseline.  The 
output  of  this  process  is  the  range  image. 

Stereo  IMangulation 


The  basic  stereo  triangulation  formula  for  perfectly  aligned  cameras  is  quoted  below  for  reference 
purposes.  It  can  be  derived  simply  from  the  principle  of  similar  triangles. _ 
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Figure  7  -  Stereo  Triangulation 
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StiA  Width  of  Disparity  Window 

Using  the  previous  result,  consider  the  width  of  the  disparity  window  which  corresponds  to  a 
typical  range  window.  It  is  most  useful  to  remove  the  dependence  on  the  focal  length  by  expressing 
disparity  as  an  angle  thus: 


Then,  for  a  range  window  between  25  meters  and  30  meters,  and  a  stereo  baseline  of  1  meter,  the 
angular  width  of  disparity  window  is: 

Thus,  the  range  of  disparities  which  correspond  to  a  typical  range  window  is  very  small  indeed. 
This  implies  diat  any  process  which  robustly  identifies  global  maxima  of  the  disparity  curve  can 
generate  the  range  window  in  an  image.  Again,  because  of  the  validity  of  the  small  incidence 
an^e  assumption,  and  because  the  window  is  based  on  the  data  itself,  the  process  will 
automatically  adapt  for  rough  terrain  and  vehicle  attitude.  Such  a  process  implements  adaptive 
sweep  inside  the  stereo  algorithm. 

^  Essential  Difficulty 

Consider  the  following  Qfpical  correlation  curve. 


Corr[dl 


If  the  search  for  the  maximum  is  confined  only  to  the  disparity  window,  then  a  local  maximum 
will  be  found  for  the  pixel,  and  not  the  correct  global  maximum.  This  implies  that  a  typical  image 
will  contain  pixels  and  regions  where  the  ranging  is  incorrect  as  well  as  the  correct  range  window 
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as  shown  below: 


Disparity  Image 


Figure  9  -  Spurious  Disparities 


Spurious  matches  occur  fundamentally  because  regions  which  do  not  correspond  physically 
actually  look  more  or  less  the  same.  This  is  called  the  repetitive  texture  problem. 

£2  Solutions  to  the  Repetitive  Texture  Problem 

A  few  techniques  are  available  to  alleviate  the  problem  of  repetitive  texture  in  stereo. 

5.5.1  Trinocular  Stereo 

A  third  camera  or  even  more  cameras  help  the  situation  because  a  spurious  match  is  less  likely  to 
occur  in  all  possible  stereo  pairs  from  a  trinocular  stereo  head. 

5.5.2  Left-Right  Line  of  Sight 

The  correlation  of  left  against  right  and  right  against  left  provides  two  separate  correlation  curves. 
A  spurious  match  is  less  likely  to  show  up  in  both  curves. 

5.5.3  Absolute  Correlation 

The  use  of  normalized  cross  correlation  as  the  matching  criterion  is  superior  to  the  use  of  the  sum 
of  squared  differences  or  SSD  criterion  because  the  absolute  value  of  the  correlation  coefficient 
actually  has  meaning.  For  example,  a  higher  SSD  may  result  from  higher  average  brightness  of 
pixels  and  may  not  imply  a  match  at  all.  On  the  other  hand,  a  local  maximum  in  correlation  curve 
can  be  assumed  to  be  a  global  maximum  when  the  correlation  itself  is  high  at  the  maximum. 

5.5.4  Large  Correlation  Window 

A  large  correlation  window  will  suppress  matches  from  small  regions  of  repetitive  texture. 
Unfortunately,  this  technique  has  the  side  effect  of  smoothing  the  range  image  as  well. 

5.5.5  Multidimensional  Maxima 

It  can  be  argued  that  the  peak  of  the  correlation  curve  is  only  one  of  three  possible  partial 
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derivatives.  The  partial  derivatives  of  correlation  ^ross  the  image  axes  are  also  meaningful.  For 
example,  the  partial  across  row  index  would  be  expected  to  be  nearly  flat  as  would  the  partial  across 
the  colunm  index.  Both  assumptions  fail  at  occluding  edges,  but  occluding  edges  cannot  be 
matched  anyway  so  it  is  legitimate  to  discard  them. 

5.5.6  Monotone  Range  Assumption 

The  monotone  range  assumption  can  be  used  successfully  in  outdoor  environments.  It  can  be 
implemented  by  removing  all  matches  in  a  column  which  are  not  part  of  the  longest  monotone  run 
of  disparity  values. 

^  Adaptlyg  Stan 

Adaptive  scan  is  somewhat  problematic  in  stereo  because  high  angular  resolution  provides  the 
texture  necessary  for  accurate  triangulation.  Adaptive  scan  can  be  simply  implemented  in  stereo 
by  skipping  columns  in  the  input  images.  Note,  however,  that  computation  of  correlation  requires 
that  the  entire  image  be  prefiltered.  Adaptive  scan  can  be  implemented  in  the  latter  stages  of  stereo 
including,  correlation,  disparity  and  triangulation. 

SJ.  Typical  Example 

The  following  figures  illustrate  the  operation  of  adaptive  stereo  on  two  input  images.  The  initial 
input  images  appear  at  the  top.  The  normalized,  texture-enhanced  images  appear  below  the  input 
images.  The  disparity  image  is  shown  to  demonstrate  the  spurious  matches  which  are  a  by-product 
of  the  disparity  window  approach.  These  correspond  to  local  maxima  in  the  correlation  curve,  but 
there  is  no  information  available  to  detect  this.  Finally,  the  cleaned  up  range  image  is  presented.  It 
incorporates  an  efficient  filter  based  on  the  monotone  range  assumption  which  removes  the  local 
maxima  and  provides  a  clean  range  image  for  processing  by  the  rest  of  the  navigation  system. 
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